fix(addie): owner evaluate_agent_quality writes to canonical compliance state by bokelley · Pull Request #4250 · adcontextprotocol/adcp

bokelley · 2026-05-08T19:01:23Z

Refs #4247 — PR 1 of 4 in the compliance-state unification initiative.

Summary

Closes the 12-hour gap between an owner running evaluate_agent_quality (via Addie chat or the test_adcp_agent MCP tool) and the public /api/registry/agents/:url/compliance endpoint reflecting the result.

Root cause: evaluate_agent_quality wrote to agent_test_history / agent_contexts.last_test_* (not visible to the compliance API). The heartbeat is the only prior writer to the canonical tables, so owner-triggered results were invisible until the next 12-hour cron.

What this PR does (owner path only):

Checks whether the calling org owns the agent via member_profiles.agents @> $1::jsonb JOIN organization_memberships — identical to resolveAgentOwnerOrg in registry-api.ts:4733.
If owner AND compliance_opt_out is not set: calls complianceResultToDbInput() + complianceDb.recordComplianceRun() with triggered_by = 'owner_test' and dry_run = false.
Adds 'owner_test' to the TriggeredBy type and both triggered_by CHECK constraints (migration 471 — covers agent_compliance_runs and agent_storyboard_status).
Explicitly omits notifyComplianceChange to prevent Slack spam on iterative owner test runs.
Retains the legacy agentContextDb.recordTest() write for backward compatibility (deprecated in PR 3).

Known gap documented for follow-up: AAO Verified badge state is still updated only by the heartbeat (badge issuance calls processAgentBadges which is in the heartbeat loop, not in recordComplianceRun). Compliance status reflects owner test results immediately; badge re-issuance still requires the next heartbeat. Tracked in #4247.

Non-breaking justification: Adds an optional write path to existing canonical tables with a new triggered_by = 'owner_test' value. No fields removed, no types changed, no existing writes modified. The new enum value is additive; the constraint drop-and-recreate is non-destructive DDL. Existing consumers of the compliance API gain data, not schema changes.

Pre-PR review

code-reviewer: approved after fixes — ownership query strengthened to join organization_memberships, compliance_opt_out guard added, dry_run: false comment added, log level on owner-check failure raised to warn
internal-tools-strategist: approved — ownership check (member_profiles.agents @> $1::jsonb) matches established codebase pattern; notifyComplianceChange suppression is correct per notification subscriber scope (Slack + external consumers via Scope3); badge-issuance gap acknowledged as follow-up

Triage-managed PR. This bot does not currently iterate on
review comments or PR conversation threads (only on the source
issue). To unblock:

Push fixup commits directly: gh pr checkout <num> →
fix → push.

Or re-trigger: comment /triage execute on the source
issue.

See #3121
for context.

Session: https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb

Generated by Claude Code

EmmaLouise2018 · 2026-05-08T19:16:50Z

Reviewing this against the updated #4247 plan. Right shape for PR 1 — owner-only canonical writes + triggered_by = 'owner_test' + compliance_opt_out guard + Slack-noise suppression all match the design. Two asks before merge to keep the public contract honest:

1. `verdict_source` field on the compliance response (load-bearing)

/api/registry/agents/:url/compliance is a public registry surface. What it reflects is changing, even though the field names aren't:

Pre-PR: "this agent's last scheduled verdict" (heartbeat-only).
Post-PR: "this agent's last verdict from any source" (heartbeat + owner_test).

A consumer scraping this endpoint daily learned heartbeat history; post-PR they'll see any verdict, possibly an owner running it on demand to test a fix. Tell the caller, don't guess silently. Add verdict_source: 'heartbeat' | 'owner_test' to the response so downstream filters correctly. Reads off the triggered_by of the latest agent_compliance_runs row joined to agent_compliance_status — same source that's already populated in this PR.

The OperatorLookupResultSchema analog (AgentComplianceDetailSchema or whichever is canonical for /compliance) gets the additive optional field. Frozen reporting contract argument applies; don't change semantics on the wire without a flag.

Alternative: a SHOULD-clause in the changeset spelling out the semantic shift for downstream scrapers. Pick one. Don't ship neither.

2. Last-write-wins race test (pin the contract)

Heartbeat and owner_test can fire within minutes. Two concurrent writers, one row in agent_compliance_status. The implicit rule under the current code is last-write-wins on (agent_url). State it. A test pins the rule:

// heartbeat-style write at T → status reflects T
// owner_test write at T+1s → status reflects T+1s
// owner_test at T+2s, heartbeat at T+3s → status reflects T+3s

A future refactor that switches to "first-write-wins" or "merge" would silently change the public contract. The test is the contract.

Lineage

Unify compliance state: every storyboard run writes to one canonical path (heartbeat + Addie + dashboard tests) #4247 — full unification plan (this PR is PR 1 of 4)
fix(registry): heartbeat picks up member-registered agents — write-side seed + read-side CTE #4252 — heartbeat-picks-up-member-agents (sibling fix on the visibility-miss bug class)
fix(registry): require declared agent type + auto-backfill primary_brand_domain #4235 — type-required + brand-domain backfill (the prior visibility-miss fix)

Approving on the strength of the owner-only canonical write pattern plus compliance_opt_out guard. Two asks above before merge — both small. The badge-issuance gap acknowledged in the PR description is the right deferral; that's an issue against #4247 to track.

@EmmaLouise2018

…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

bokelley · 2026-05-08T19:26:16Z

Both asks addressed in the fixup commit (0c996a6):

1. verdict_source on /api/registry/agents/:url/compliance

AgentComplianceDetailSchema gets a new optional field:

verdict_source: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null

getComplianceStatus and bulkGetComplianceStatus now include a LATERAL subquery against agent_compliance_runs (dry_run = false, ORDER BY tested_at DESC LIMIT 1) to surface triggered_by as last_triggered_by on AgentComplianceStatus. The endpoint emits it as verdict_source. No migration needed — the data is already there.

2. Last-write-wins contract test

New compliance-db-last-write-wins.test.ts (4 cases) pins:

Every recordComplianceRun call issues ON CONFLICT (agent_url) DO UPDATE — never DO NOTHING.
triggered_by is forwarded verbatim for both heartbeat and owner_test sources.
getComplianceStatus returns last_triggered_by from the most recent non-dry run (asserts dry_run = false and ORDER BY tested_at DESC LIMIT 1 are in the SQL).

A future switch to first-write-wins or source-priority filtering breaks these tests.

All 3302 unit tests pass.

Session: https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

Generated by Claude Code

bokelley · 2026-05-09T00:02:20Z

Code review (expert pass): solid root, ready to merge after stripping committed dist/ files.

Block:

dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} are checked in. Generated artifacts shouldn't be in the diff. Remove and amend.

Nits (non-blocking):

server/src/addie/mcp/member-tools.ts:3578 — comment says "fall through to the legacy agent_test_history path below" but fix(addie): backfill owner test history + stop dual-write for owner runs #4264 makes that conditional. Will be stale by the time fix(addie): backfill owner test history + stop dual-write for owner runs #4264 lands.
compliance-db-last-write-wins.test.ts mocks query/getClient. Repo convention is real-Postgres tests. Acceptable as a SQL-string contract test; flag for follow-up.

After dist strip, mark ready and the chain unblocks (#4263 is clean too — see #4264 for the real chain blocker).

bokelley · 2026-05-09T00:04:11Z

Blocker addressed — pushed c71809a0.

Removed dist/schemas/onboarding-openapi.{js,d.ts,js.map,d.ts.map} via git rm.
Added .gitignore rules for dist/schemas/*.{js,js.map,d.ts,d.ts.map} so build artifacts in that directory can't sneak back in.

Nits noted (not fixed):

member-tools.ts:3578 comment — will be stale once fix(addie): backfill owner test history + stop dual-write for owner runs #4264 lands. Flagged for whoever owns fix(addie): backfill owner test history + stop dual-write for owner runs #4264 to clean up on merge.
compliance-db-last-write-wins.test.ts mock pattern — acknowledged as a SQL-string contract test deviation from real-Postgres convention; follow-up tracked in the PR body.

Ready to mark out of draft when you are.

Triaged by Claude Code. Session: https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx

Generated by Claude Code

@EmmaLouise2018

…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.

…o keys (#4364) * fix(compliance): rewrite deriveStoryboardStatuses for SDK 6.x scenario keys The compliance heartbeat has been writing zero rows to agent_storyboard_status since the SDK switched comply() to storyboard- driven testing. The SDK emits one TestResult per phase of each storyboard, keyed `<storyboard_id>/<phase_id>` in result.tracks[].scenarios[].scenario (see @adcp/sdk compliance/storyboard-tracks.ts). The old implementation walked the YAML's per-step `comply_scenario` field (bare names like `signals_flow`, `capability_discovery`) and looked them up in the SDK's scenario map. Every lookup missed → testedCount === 0 → every storyboard skipped at the `continue` guard. Effect across the registry: agent_storyboard_status total rows: 6 (across 4 agents) rows written by triggered_by='heartbeat': 0 rows surviving were legacy bare-name keys from old manual runs This silently broke the AAO Verified badge pipeline (no storyboard rows → deriveVerificationStatus has nothing to verify against) and every agent's dashboard `storyboards_passing: 0 / N` was misleading: the runner wasn't failing storyboards, the parser was dropping them. Surfaced by escalation #329: Evgeny's agent was running 30/30 scenarios clean but showing `degraded` because specialism_status.signal-owned read 'untested' from a never-populated agent_storyboard_status row. Fix: read SDK output directly. Group scenarios by storyboard id, roll per-step pass counts up from each phase's `steps` array, fall back to phase-level counts when steps are absent. The `storyboardIds` override is preserved for explicit-IDs callers that need an `untested` entry when the runner didn't run a requested storyboard. The unused YAML `comply_scenario` field is no longer load-bearing for status mapping (the SDK already knows which storyboards it ran). Tests: 9 cases covering all-pass, partial, all-fail, phase-only fallback, legacy bare-name skip, empty input, and explicit-IDs untested gap. Stack note: this is orthogonal to Emma's #4247 compliance-state unification stack (#4250, #4263, #4264, #4268, #4274) which collapses agent_test_history into agent_compliance_runs. Different files; rebases cleanly in either order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(scripts): test-comply-storyboard-statuses — local harness for the fix Runs comply() against an agent URL and prints what deriveStoryboardStatuses would produce, without DB writes. Used to validate the SDK-6.x scenario-key fix against real agents (adcp-signals-adaptor.evgeny-193.workers.dev/mcp and wonderstruck.sales-agent.scope3.com/mcp) before merging. Will stay useful for future SDK upgrades that touch scenario emission or storyboard-track aggregation — same pattern as the diagnose-agent-comply-queue script from #4361. Usage: npx tsx server/src/scripts/test-comply-storyboard-statuses.ts <agent-url> [<agent-url> ...] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(compliance): code review nits — clarify steps doc, hoist explicit-ids check, add 3 edge tests Addresses code-reviewer feedback on PR #4364: - JSDoc on deriveStoryboardStatuses now calls out that steps_passed/total are not directly comparable across rows (some rows are real step counts, some are phase-level fallbacks when the SDK omits per-step data). - Comment pinning the storyboard-id invariant (flat ids, no `/`) so the indexOf split stays correct as new storyboards land. - Defensive `result.tracks ?? []` so a malformed result doesn't throw. - Hoist `storyboardIds && length > 0` into a single `hasExplicitIds` const used at both the toEmit decision and the no-data fallback. - Three new test cases: * same storyboard split across multiple tracks aggregates correctly * result.tracks absent → [] * non-string scenario values (null, number) → skipped without throwing 12/12 vitest passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-05-11T08:56:00Z

Heads up — PR #4364 just merged to main, which fixes deriveStoryboardStatuses to read SDK 6.x's <storyboard_id>/<phase_id> scenario keys. Pre-#4364, every call to complianceResultToDbInput → deriveStoryboardStatuses produced an empty storyboard_statuses array (so agent_storyboard_status had zero heartbeat-triggered rows registry-wide).

Without #4364 underneath, this PR's owner-test write path would have inherited the same bug — owner runs from Addie would have written canonical agent_compliance_runs rows but with empty storyboard arrays, so the public registry would still show storyboards_passing: 0/N for healthy agents tested by their owner.

With #4364 in main: owner-test writes will produce real per-storyboard data on first run after this PR ships. Worth a rebase before merge so CI exercises both fixes together.

Linking #4364 here for the chain: #4364 → #4250 → #4263 → #4264 → #4268 → #4274.

…ce state Closes the 12-hour gap between owner-triggered storyboard runs and the public /api/registry/agents/:url/compliance endpoint (issue #4247, PR 1 of 4). When evaluate_agent_quality is triggered by the agent owner, the result is now written to agent_compliance_status + agent_compliance_runs + agent_storyboard_status with triggered_by = 'owner_test'. Non-owner runs continue writing to agent_test_history (deprecated in PR 3). Migration 471 adds 'owner_test' to both triggered_by CHECK constraints. notifyComplianceChange is intentionally suppressed for owner runs to prevent iteration-loop Slack spam. https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb

@EmmaLouise2018

…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ

…rtifacts Generated JS/TS files don't belong in source control. Also adds .gitignore rules for dist/schemas/*.{js,d.ts,*.map} to prevent recurrence. https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx

…tion_queue on main)

Six changes responding to expert review: 1. Renumber migration 472 → 475 (collision with 472_drop_member_profiles_ primary_brand_domain.sql on main since this PR was opened). 2. Drop `verdict_source` from the public /api/registry/agents/:url/compliance response. Heartbeat and owner_test both call comply() against the same registered URL with the same owner-saved credentials; the verdict's truth content is identical regardless of who pulled the trigger. Surfacing the source label to buyers creates a trust distinction the underlying observation doesn't carry. `triggered_by` stays as internal audit on agent_compliance_runs; Emma's #4263 dashboard surface keeps the operator UX cue. 3. Drop the legacy `recordComplianceRun(... 'manual')` write inside evaluate_agent_quality. Pre-PR, that write was invisible publicly. Post-PR (with verdict_source on the response), it would have leaked `verdict_source: 'manual'` for non-owner runs — security-reviewer's B1. Even with verdict_source removed (item 2), the legacy write was gated only on `agent_contexts` row existence — which `save_agent` lets any org create for any URL without an ownership check — so a non-owner could publish a verdict on someone else's agent. The owner-test branch covers the dashboard-freshness use case with a real ownership check; the legacy public-state write has no remaining function. Legacy agent_test_history write retained as session-scoped audit until Emma's PR 3 cleans up. 4. Dashboard "Run this storyboard" endpoint now writes `triggered_by = 'owner_test'` instead of `'manual'` (owner-only path, consistent with the new semantic). Legacy `'manual'` value remains in the enum for historical rows. 5. Rate limit on evaluate_agent_quality via existing Addie tool rate limiter (default 60/10min). comply() itself takes 10-60s per run so the natural ceiling is already ~1-2/min — this adds a hard wall above that. Security-reviewer's B2. 6. Updated changeset content + migration number references; replaces stale "migration 471" prose from earlier renumber. Out of scope (deferred follow-ups): - Badge issuance on owner_test transitions: owner currently waits up to 1h for next heartbeat to re-issue badge. Skipped because the badge fan-out in compliance-heartbeat.ts is ~70 lines and extracting/sharing is its own refactor. - Extracting resolveAgentOwnerOrg to a shared helper: same drift-surface call applies, kept inline to bound this PR's scope. - Active-membership filter: organization_memberships has no status column (row deletion = removal), so the security-review M2 concern doesn't apply at the schema layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-05-11T09:30:36Z

Review iteration — addressed security + correctness feedback

Pushed 828ea1d with the following changes:

Blockers

✅ Migration renumbered 472 → 475 (collision with 472_drop_member_profiles_primary_brand_domain.sql on main)
✅ Changeset updated (was referencing stale "migration 471")
✅ Security B1 — Dropped the legacy recordComplianceRun(... 'manual') write inside evaluate_agent_quality. That path was gated only on agent_contexts row existence (which save_agent lets any org create for any URL with no ownership check), so a non-owner could publish a verdict_source: 'manual' on someone else's agent.
✅ Security B2 — Rate limit on evaluate_agent_quality via existing Addie tool rate limiter (default 60/10min). comply() itself takes 10-60s per run.

Trust-model fix

✅ Dropped verdict_source from the public /api/registry/agents/:url/compliance response entirely. Discussion with Brian: heartbeat and owner_test both call comply() against the same registered URL with the same owner-saved credentials; the verdict's truth content is identical regardless of who pulled the trigger. Exposing the source label to buyers creates a trust distinction the underlying observation doesn't actually carry. triggered_by stays as internal audit on agent_compliance_runs; Emma's feat(dashboard): surface verdict_source + per-run triggered_by badge #4263 dashboard surface keeps the operator UX cue.

Consistency fix

✅ Dashboard "Run this storyboard" endpoint now writes triggered_by = 'owner_test' instead of 'manual' (owner-only path; consistent with the new semantic).

Out of scope — follow-ups, not part of this PR

Badge issuance on owner_test transitions: owner currently waits up to 1h for next heartbeat to re-issue badge. The badge fan-out in compliance-heartbeat.ts is ~70 lines and sharing it requires extracting to a helper — its own refactor.
Extracting resolveAgentOwnerOrg to a shared helper: real drift surface, but keeping inline to bound this PR's scope.

N/A

Active-membership filter: organization_memberships has no status column (row deletion = removal), so security-review M2 doesn't apply at the schema layer.

Ready for re-review.

bokelley · 2026-05-11T09:42:20Z

Re-review (post-828ea1d)

Two experts (code-reviewer + security-reviewer) reviewed the updated diff. No ship-blockers.

Confirmed addressed:

B1 (unguarded 'manual' write): member_profiles.agents @> $2::jsonb JOIN organization_memberships closes the hole — an org with only an agent_contexts row for a URL they don't own will fail the ownership query. compliance_opt_out guard is present on the canonical-write path.
B2 (rate limit): checkToolRateLimit at the handler top is correct; the 10–60 s/run natural ceiling means the hard wall is belt-and-suspenders.
Trust model: last_triggered_by is NOT serialized in the public /api/registry/agents/:url/compliance route handler (confirmed — that handler enumerates fields explicitly). Decision and rationale well-documented in the comment block.
CODE_VERSION: bumped to 2026.05.1 ✓
Both getComplianceStatus and the bulk variant updated with the LATERAL join ✓

Nits (non-blocking, your call):

Migration 474 gap. main tops at 473_users_primary_organization_id_fk.sql; the PR uses 475_, permanently skipping 474_. If this was intentional — avoiding a known in-flight PR — no action needed. Otherwise, rename to 474_.
workosUserId shadow. The const workosUserId = ... inside if (organizationId) re-declares the same value already in scope from the rate-limit check above. Delete the inner declaration and use the outer one.
LATERAL missing partial index. The existing (agent_url, tested_at DESC) index doesn't filter on dry_run; Postgres will scan all rows for the agent before satisfying LIMIT 1 once dry-run rows accumulate. Adding to migration 475:
```
CREATE INDEX CONCURRENTLY idx_compliance_runs_nondry
  ON agent_compliance_runs(agent_url, tested_at DESC)
  WHERE dry_run = false;
```
backs the LATERAL efficiently without a table lock.
owner_test visible in run history. The public /api/registry/agents/:url/compliance/history endpoint emits triggered_by per run, so external callers can see when an owner self-tested. Low-signal disclosure (reveals "owner ran a test," not anything beyond the already-public verdict), but worth confirming the behavior is intended.

Triaged by Claude Code. Session: https://claude.ai/code/session_014mrjryUmFVVrB2BzDhgMZR

Generated by Claude Code

PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.

…4263) PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.

closes #4377) (#4388) PR #4250's review flagged the agent-ownership JOIN as a drift surface — duplicated inline in two places with subtly different semantics: - registry-api.ts:4825 `resolveAgentOwnerOrg(userId, agentUrl)` — "any org the user is a member of that owns the agent" - member-tools.ts:3588 (inline) — "the resolved member-context org IS the owning org" (tighter predicate that adds the org_id constraint) Both queries join `member_profiles.agents` against `organization_ memberships` — same canonical relation, different selectivity. This PR extracts both to `server/src/services/agent-ownership.ts`: - `findOwnerOrgForUser(userId, agentUrl): Promise<string | null>` — registry-api.ts's "discover ownership" use case - `isOrgOwnerOfAgent(orgId, userId, agentUrl): Promise<boolean>` — member-tools.ts's "confirm specific org" use case Both call sites updated to use the helpers. `resolveAgentOwnerOrg` is retained as a closure-scoped alias inside the registry-api factory so existing call sites don't need to thread the import. Note on active-membership filtering: `organization_memberships` has no status column in this schema — removed members get their row deleted, not status-flipped. Row existence is the membership signal. Documented in the helper file's module doc. Tests (7 passing): - findOwnerOrgForUser returns org_id, null on no match, null on throw - isOrgOwnerOfAgent returns true/false correctly - semantic distinction pinned: findOwnerOrgForUser uses 2 params (no org filter); isOrgOwnerOfAgent uses 3 params (org_id constraint) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley added the claude-triaged Issue has been triaged by the Claude Code triage routine. Remove to re-triage. label May 8, 2026

bokelley mentioned this pull request May 8, 2026

Unify compliance state: every storyboard run writes to one canonical path (heartbeat + Addie + dashboard tests) #4247

Open

12 tasks

EmmaLouise2018 force-pushed the claude/issue-4247-owner-test-canonical-write branch from c71809a to 42e7f37 Compare May 9, 2026 00:09

bokelley mentioned this pull request May 11, 2026

fix(compliance): rewrite deriveStoryboardStatuses for SDK 6.x scenario keys #4364

Merged

claude and others added 5 commits May 11, 2026 05:07

chore(dist): remove accidentally committed onboarding-openapi build a…

7c3931f

…rtifacts Generated JS/TS files don't belong in source control. Also adds .gitignore rules for dist/schemas/*.{js,d.ts,*.map} to prevent recurrence. https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx

fix(migrate): renumber 471 → 472 (resolve clash with manager_revalida…

6d74c5d

…tion_queue on main)

fix(addie): address Brian's review — DDL lock guard + cross-org gap note

f7f933b

bokelley force-pushed the claude/issue-4247-owner-test-canonical-write branch from 54a4f16 to f7f933b Compare May 11, 2026 09:10

bokelley marked this pull request as ready for review May 11, 2026 09:45

bokelley merged commit 45f3371 into main May 11, 2026
15 checks passed

bokelley deleted the claude/issue-4247-owner-test-canonical-write branch May 11, 2026 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250

fix(addie): owner evaluate_agent_quality writes to canonical compliance state#4250
bokelley merged 6 commits intomainfrom
claude/issue-4247-owner-test-canonical-write

bokelley commented May 8, 2026

Uh oh!

EmmaLouise2018 commented May 8, 2026

Uh oh!

bokelley commented May 8, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

bokelley commented May 11, 2026

Uh oh!

bokelley commented May 11, 2026

Uh oh!

bokelley commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bokelley commented May 8, 2026

Summary

Pre-PR review

Uh oh!

EmmaLouise2018 commented May 8, 2026

1. verdict_source field on the compliance response (load-bearing)

2. Last-write-wins race test (pin the contract)

Lineage

Uh oh!

bokelley commented May 8, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

bokelley commented May 9, 2026

Uh oh!

bokelley commented May 11, 2026

Uh oh!

bokelley commented May 11, 2026

Review iteration — addressed security + correctness feedback

Uh oh!

bokelley commented May 11, 2026

Re-review (post-828ea1d)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `verdict_source` field on the compliance response (load-bearing)